Temporal Anchor Text as Proxy for Real User Queries
نویسندگان
چکیده
Web archives preserve the fast changing web. While we can archive the web pages, the popularity of queries in the past has usually not been preserved. Previous studies have observed the importance of anchor text for improving the quality of text search, and have shown that anchor text is similar to real user queries and documents titles. Other studies have shown that documents titles are similar to the real user queries. In this paper, we propose an approach to reconstruct the information that would be provided by query log in the past using temporal anchor text. First, we study the link graph of four years of Web archive in order to show how the target hosts and anchor text evolve over time. Second, we investigate the importance of anchor text over time. Our approach is to rank anchor text based on their popularity in the archive at specific time. Then, we check the importance of the top ranked anchor text in the public Web at the same time. In order to achieve this, we used the WikiStats dataset which aggregates page views of Wikipedia pages. Using exact string matching between top ranked anchor text and Wikipedia titles in the WikiStats dataset, we find a high percentage of overlap (approximately 57%). Our data strengthens the hypothesis that anchor text may be used as a proxy for actual query volume.
منابع مشابه
Querytext – Using Queries and Clicks to Improve Text Matching for Web Search
User queries and their associated clicks have been extensively explored to improve Web search relevance. Very little existing work explores how user clicks can be used to improve text matching for Web search. In this paper, we treat user queries that result in clicks as off-page annotations. These queries, like anchor text, provide a valuable additional source of relevance information for Web p...
متن کاملEvents Retrieval Using Enhanced Semantic Web Knowledge
In this article, we present an experimental end user application to query DeRiVE 2011 challenge dataset in an innovative and intuitive manner. After enriching the dataset with external sources of information, it is indexed in a way that enables users to submit queries combining keywords, location and temporal anchor, in a single search field. The goal is to ease event retrieval providing a simp...
متن کاملDeveloping a ChatBot to Answer Spatial Queries for use in Location-based Services
A Chat Bot is an automated operator that can interact with customers like a human operator, answer their questions, solve problems and get feedback. Real-time responsiveness, the sense of talking to a human user is one of their good features that can be used to deliver location-based services. This paper designed a Chat Bot that can talk and answer users' questions based on their location. Thi...
متن کاملAutomatic Query Type Identification Based on Click Through Information
We report on a study that was undertaken to better identify users’ goals behind web search queries by using click through data. Based on user logs which contain over 80 million queries and corresponding click through data, we found that query type identification benefits from click through data analysis; while anchor text information may not be so useful because it is only accessible for a smal...
متن کاملUniversity of Essex at the TREC 2012
The primary goal of our participation in the Session track is to further evaluate our anchor expansion technique proposed in the previous year [1]. In particular, we aim to test the effectiveness of this approach on a more realistic dataset collected this year. SWIRL 2012 noted that there is still a large gap between the study of users and the study of IR algorithms [2], so the session data col...
متن کامل